
Issues and Solutions in Bringing Heterogeneous Water 





James Acker 1 , Steven Kempler 2 , William Teng 1 , Deborah Belvedere 3 , Zhong Liu 4 , Gregory Leptoukh 2 


1 NASA Goddard Space Flight Center/SESDA2, 2 NASA Goddard Space Flight Center, 3 UMBC/GEST, 4 GMU 


Steven.J.Kempler@nasa.gov 


™i F, 


Workshop Results (http://news.cisc.gmu.edu/cewisworkshop.htm) 












■ 




Abstract 


The water cycle research community has generated many regional-to-global -scale products using data from 

individual NASA missions or sensors (e.g., TRMM, AMSR-E); multiple ground- and space-based data sources 


(e.g., Global Precipitation Climatology Project [GPCP] products); and sophisticated data assimilation systems (e.g., 

Land Data Assimilation Systems [LDAS]). 



However, it is often difficult to access, explore, merge, analyze, and intercompare these data in a coherent 


manner due to issues of data resolution, format, and structure. 


These difficulties were substantiated at the recent Collaborative Energy and Water Cycle Information 


Services (CEWIS) Workshop, sponsored by NASA Energy and Water cycle Study (NEWS) Program Manager 


Jared Entin, where members of the NEWS community gave presentations, provided feedback, and developed 


scenarios which illustrated the difficulties and techniques for bringing together heterogeneous datasets. 


This presentation reports on the findings of the workshop, thus defining the problems and challenges of 


multi-dataset research. In addition, the CEWIS prototype shown at the workshop will be presented to 


illustrate new technologies that can mitigate data access roadblocks encountered in multi-dataset research, 


including: 


-Quick and easy search and access of selected NEWS data sets. 


-Multi-parameter data subsetting, manipulation, analysis, and display tools. 


Challenges/Issues in Bringing Together and 
Utilizing Heterogeneous Data Sets 


From CEWIS Workshop: 

Steps taken to gather and prepare data for multi-data set inter- 
comparisons 

Responses: 

Retrieving Data 

• Identify the sources of data, found by searching data center archives and 


web 


-Access to input and derived water cycle data (data lineage). 


It is hoped that this presentation will encourage community discussion and feedback on heterogeneous data 


analysis scenarios, issues, and remedies. 






National Aeronautics 
and Space Administration 


Goddard Earth Sciences 

Data and Information Services Center 


Search DISC 


*■ Advanced Search 


+ ATMOS COMPOSITION + HYDROLOGY 


+ PRECIPITATION 


Water Cycle Data 



CEWIS Portal 


» OVERVIEW 


I You are here: GES DISC Home » Water Cycle Data Portal 


+ DATA HOLDINGS 


DOCUMENTATION 


Overview 

Collaborative Energy and Water Cycle Information Services 
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The goal of the Collaborative Energy and Water Cycle Information Services (CEWIS) endeavor is to reduce the time and 
resources spent by scientists on data acquisition and data management — and thus facilitate energy and water cycle 
research by providing easy access to - and cross-data set manipulation tools for - a community-driven inventory of energy 
and water cycle data products. 


CEWIS seeks to address the overarching question: What data services does the energy and water cycle community need 
to become more efficient in conducting their science research with multiple and disparate data sets? 
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Thus, some of the CEWIS objectives are: 

■ To facilitate water cycle research by bringing together heterogeneous datasets; 

■ To save researchers valuable time and reduce the frustration of having to locate and 
interest from various sites; 

■ To enable the processing of energy and water cycle data into information: subsettir| 
preliminary analyses, co-registration, etc., for specific data of user interest; and 

■ To build upon the NEWS Data Information Center (NDIC) and NASA data archives w| 
further facilitate the usage and usability of energy and water cycle data. 
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Select a product from the drop down list and it will display the data information and access methods. To use the 
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Product Information 


Shortname: CMORPHRAIN 

Description: CMORPH 0.25 degree 3-hourly precipitation estimates 
Data Format: 3-Hourly Product: CMORPHRAIN 


Click Here for the CMORPHRAIN Product Informa 


Click Here for the Product Readme Document 


Data Access Methods 


Mirador 

A simplified, clean interface and employs the Google mini appliance for metadata keyword searches. 
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• Find relevant data with the desired data characteristics 

• Get samples of data; sort out data ‘quirks’; determine if data are 
readable/correct 

• Check units, timestamp, quality control flags 

• Understand data characteristics (format, time period and resolution) 

Assembling Data 

• Collocate from different instruments 

• Bring data sets to common grid (interpolate, collocate) 

Analyzing Data 

• Acquire data read code 

• Perform data subsetting 

• Perform data intercomparisons 

• Homogenize different data sets for objective comparisons 


Roadblocks encountered when bringing heterogeneous data sets 
together 

Responses: 

Data Access 

• Finding and gaining access to the data 

• Data sets tend to be organized on a project-specific basis, making it difficult to 
know what other data sets might be applicable to a given problem 

• Lack of a nice "search engine" to quickly locate the data 

Data Characteristics 

• Learning how to read data correctly 

• Data volume download interruption 

• Spatially and temporally subsetting data 

Combining Datasets 

• Finding collocated data sets 

• Converting data of different formats 

• Properly converting data to common grid 

• Users are expected to have knowledge of multiple sensors 

• Dealing with different spatial/temporal resolutions, spanning periods, aspects of 
heterogeneous data sets, instrument fields of view 

Verifying Combined Data sets 

• Quantifying errors introduced during interpolation 

• Distinguishing natural signal from systematic errors 

• Time/space gridding mismatches across data sets; reconciling data that are not 
uniform in space and time 

Data Documentation 

• Undocumented features in data 

• Inadequate quality control, error estimation, detailed documentation of data 
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Demonstrating a Solution: CEWIS 

The goal of the Collaborative Energy and Water Cycle Information Services (CEWIS) is to develop a 
community-based evolving set of data and information services that would facilitate users to locate, 
access, and bring together multiple distributed heterogeneous energy and water cycle datasets. 


Demonstration Purpose 

•To show data services for NEWS data sets that facilitate multi dataset research 

•To provide some starting points of discussion on NEWS multi-dataset analysis requirements 


Demo Components 

•CEWIS Portal 

•Data search and access (Mirador) 

- "Manual" option 

- OpenSearch options 

— Data provider has their own search engine 

— Data is published to ECHO 

— Data provider installed provided search engine 
•Data visualization and analysis (Giovanni) 

- 3 instances (monthly, daily, 3-hourly) 


Thanks to our NEWS Pis Demo 
Collaborators 

•Eric Fetzer - Merged Atmospheric Water Data Set 

from the A-Train 

•Bill Rossow - ISCCP 

•Robert Joyce - CMORPH 

•George Huffman - GPCP2 

•Matt Rodell - GLDAS 
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CEWIS Protytype for 3-Hourly Data 


■ Giovanni Data Visualization 
and Analysis 


The CEWIS Giovanni 3-hourly instance provides a window to the data sets used in the CEWIS Prototype, including a sampling of relevant parameters 
and services. 
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